MRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid

نویسندگان

  • Li Ma
  • Lei Gu
  • Bo Li
  • Shouyi Qiao
  • Jin Wang
چکیده

DBSCAN is a density-based clustering algorithm. This algorithm clusters data of high density. The traditional DBSCAN clustering algorithm in finding the core object, will use this object as the center core, extends outwards continuously. At this point, the core objects growing, unprocessed objects are retained in memory, which will occupy a lot of memory and I/O overhead, algorithm efficiency is not high. In order to ensure the high efficiency of DBSCAN clustering algorithm, and reduce its memory footprint. In this paper, the original DBSCAN algorithm was improved, and the G-DBSCAN algorithm is proposed. G-DBSCAN algorithm reduces the number of query object as a starting point. Put the data into the grid, with the center point of the data in the grid to replace all the grid points as the algorithm input. The query object will be drastically reduced, thus improving the efficiency of the algorithm, reduces the memory footprint. In order to make the G-DBSCAN algorithm can adapt to large data processing, we will parallelize the G-DBSCAN algorithm, and combining it with Map Reduce framework. The results prove that G-DBSCAN and MRG-DBSCAN algorithm are feasible and effective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ability for arbitrarily-shaped clusters and its robustness to outliers. Generally, the complexity of DBSCAN is O(n) in the worst case, and it practically becomes more severe in higher dimension. Grid-based DBSCAN is one of the recent improved algorithms aiming at facilitating efficiency. However, the performance of grid-based...

متن کامل

Grid-based Data Stream Clustering for Intrusion Detection

As a kind of stream data mining method, stream clustering has great potentiality in areas such as network traffic analysis, intrusion detection, etc. This paper proposes a novel grid-based clustering algorithm for stream data, which has both advantages of grid mapping and DBSCAN algorithm. The algorithm adopts the two-phase model and in the online phase, it maps stream data into a grid and the ...

متن کامل

A New Clustering Algorithm Based on Near Neighbor Influence

Clustering has been used in many areas. It is an unsupervised learning method which tries to find some distributions and patterns in unlabeled data sets. Although clustering algorithms have been studied for decades, none of them is all purpose. This paper presents a new clustering algorithm, Clustering based on Near Neighbor Influence (CNNI), an improved version in time cost of CNNI algorithm (...

متن کامل

بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائه‌شده برای آن

Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...

متن کامل

Improvement of density-based clustering algorithm using modifying the density definitions and input parameter

Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015